Speeding Up Speaker Diarization by Using Prosodic Features
نویسندگان
چکیده
In this article we present a method to speed up agglomerative clustering used in speaker diarization by using long-term prosodic features. A set of these features is used to decide which clusters should be merged. This strategy reduces the number of decisions that have to be performed using the more calculation-intensive method based on the Bayesian Information Criterion (BIC). We show a speedup of 30 % to a state-of-the-art diarization system. This work was partly funded by DTO VACE contract number NBCHC060157. Gerald Friedland and Christian Müller were supported by a fellowship within the postdoc program of the German Academic Exchange Service (DAAD). Speeding Up Speaker Diarization by Using Prosodic Features Yan Huang,Gerald Friedland,Christian Müller, Nikki Mirghafori International Computer Science Institute, Berkeley Department of Computer Science, University of California, Berkeley {yan,fractor,cmueller,nikki}@icsi.berkeley.edu
منابع مشابه
Prosodic and Phonetic Features for Speaker Clustering in Speaker Diarization Systems
This work is focused on speaker clustering methods that are used in speaker diarization systems. The purpose of speaker clustering is to associate together segments that belong to the same speaker and is usually applied in the last stage of the speaker-diarization process. We concentrate on developing proper representations of speaker segments for clustering. We realize two different speaker cl...
متن کاملUsing voice-quality measurements with prosodic and spectral features for speaker diarization
Jitter and shimmer voice-quality measurements have been successfully used to detect voice pathologies and classify different speaking styles. In this paper, we investigate the usefulness of jitter and shimmer voice measurements in the framework of the speaker diarization task. The combination of jitter and shimmer voice-quality features with the long-term prosodic and shortterm spectral feature...
متن کاملThe Detection of Overlapping Speech with Prosodic Features for Speaker Diarization
Overlapping speech is responsible for a certain amount of errors produced by standard speaker diarization systems in meeting environment. We are investigating a set of prosody-based long-term features as a potential complement to our overlap detection system relying on short-term spectral parameters. The most relevant features are selected in a two-step process. They are firstly evaluated and s...
متن کاملDetection and Handling of Overlapping Speech for Speaker Diarization
This thesis concerns the detection of overlapping speech segments and its further application for the improvement of speaker diarization performance. We propose the use of three spatial cross-correlationbased parameters for overlap detection on distant microphone channel data. Spatial features from di↵erent microphone pairs are fused by means of principal component analysis or by an approach in...
متن کاملDeveloping On-Line Speaker Diarization System
In this paper we describe the process of converting a research prototype system for Speaker Diarization into a fully deployed product running in real time and with low latency. The deployment is a part of the IBM Cloud Speech-to-Text (STT) Service. First, the prototype system is described and the requirements for the on-line, deployable system are introduced. Then we describe the technical appr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014